Methods for creating hierarchical copies

ABSTRACT

A method for copying a logical volume in a data storage system includes forming a first logical volume, storing in physical storage of the data storage system a quantity of data of the first logical volume, and receiving a first command to copy the first logical volume to a second logical volume. In response to the first command, meta-data is formed having a size that is independent of the quantity of the data. In response to a second command to access the data, the meta-data is used to access the data.

FIELD OF THE INVENTION

The present invention relates generally to methods and apparatus fordata storage. More particularly, the present invention relates tomethods and apparatus for managing copies of logical volumes in datastorage systems.

BACKGROUND OF THE INVENTION

Data storage systems generally store data on physical media in a mannerthat is transparent to host computers. From the perspective of a hostcomputer, data is stored at logical addresses located on file systems,or logical volumes. Logical volumes are typically configured to storethe data required for a specific data processing application. Datastorage systems map such logical addresses to addressable physicallocations on storage media, such as direct access hard disks. In atypical configuration, physical locations comprise tracks on a harddisk. A track can typically store many blocks of data. Systemadministrators frequently need to make copies of logical volumes inorder to perform backups or to test and validate new applications. Datastorage systems may implement the copying tasks without physicallycopying the data. Prior art for such implementation generally refers tothe process as “instant copying.” When a logical copy is made, data onlyneeds to be written physically when a portion of one of the copies ismodified.

U.S. Pat. No. 6,779,094 to Selkirk, et al., whose disclosure isincorporated herein by reference, describes various instant copymechanisms for copying data upon receiving a write operation to eitheroriginal or copy data. Upon receiving a write operation for writing newdata to a first data location, new data is written to a second datalocation. Multiple layers of mapping tables provide uniqueidentification of the storage location of the data such that individualentries in the mapping tables are variable and may be self-defining.

U.S. Pat. No. 6,779,095 to Selkirk, et al., whose disclosure isincorporated herein by reference, describes the use of a plurality oflayers of mapping tables for storing data. The mapping tables provideunique identification of location of the data. When the data is copied,the physical placement of the original data is described by a mappingmechanism known as the original data map. This identifies the physicalstorage location used to store the original data. The physical placementof the copy data is described by a mapping mechanism known as the copydata map. This identifies the physical storage location used to storethe copy data.

U.S. Patent Publications 2003/0195887 and 2003/0208463 to Vishlitzky, etal., whose disclosures are incorporated herein by reference, describe astorage device containing a first storage area of a first typecontaining data and a second storage area of a second type containing atable of pointers to data provided in the storage area of the firsttype. The second storage area is a virtual storage area containing nosections of data and represents a copy of data of the first storage areaat a point in time.

U.S. Pat. No. 6,820,099 to Huber, et al., whose disclosure isincorporated herein by reference, describes the use of a snapshot volumeto update a primary, or “base,” logical volume. Updates are made to thesnapshot volume while the base volume is still used to satisfy normaldata access requests. After the updating of the snapshot is complete,the snapshot is rolled back to the base volume. During rollback, updateddata are available from either the snapshot or from the base volume, andthus the updating appears to be instantaneous.

U.S. Pat. No. 6,687,718 to Gagne, et al., whose disclosure isincorporated herein by reference, describes transferring data from adata altering apparatus, such as a production data processing site, to aremote data receiving site. A data storage facility includes a firstdata store for recording each change in the data generated by the dataaltering apparatus. A register set records each change on atrack-by-track basis. A second data store has first and second operatingmodes. During a first operating mode the second data store becomes amirror of the first data store. During a second operating mode thesecond data store ceases to act as a mirror and becomes a source for atransfer of data to the data receiving site. Only information that hasbeen altered, i.e., specific tracks that have been altered, aretransferred during successive operations in the second operating mode.

U.S. Pat. No. 6,513,102 to Garrett, et al., whose disclosure isincorporated herein by reference, describes a system for transferringdata from a first storage device, accessible to a first commandprocessor, to a second storage device accessible to a second commandprocessor but not necessarily to the first processor. In this aspect ofthe invention, the transfer is made internally of the storage controllerrather than requiring the command processors to communicate directlywith each other.

U.S. Pat. No. 6,742,138 to Gagne, et al., whose disclosure isincorporated herein by reference, describes a data recovery program thatrestores data in a first storage device using data from a second storagedevice. The program also updates the first storage device with datasupplied from a host.

U.S. Pat. No. 6,574,703 to Don, et al., whose disclosure is incorporatedherein by reference, describes a method for initializing an extent on amass storage device having at least one track. The method preserves datain a track from being overwritten, and indicates that the data of thetrack is to be replaced. The method also associates an initializationcode with the track indicating that the track is to be initialized.

U.S. Patent Publication 2003/0195864 to Vishlitzky, et al., whosedisclosure is incorporated herein by reference, describes providingstorage areas of a multiplicity of types that contain sections of data.Pointers are provided that are claimed to allow access or not to allowaccess to the data.

U.S. Pat. No. 6,839,827 to Beardsley, et al., whose disclosure isincorporated herein by reference, describes a method for mapping logicalblocks to physical storage blocks. A storage controller defines thelogical storage space as a sequence of logical chunks, wherein eachlogical chunk comprises a plurality of logical blocks in the logicalstorage space. The storage controller further defines a physical storagespace as a sequence of physical chunks, wherein each physical chunkcomprises a plurality of physical blocks in the physical storage system.The storage controller associates each logical chunk in the sequence oflogical chunks defining the logical storage space with one physicalchunk in the physical storage system. Further, the contiguous logicalchunks are capable of being associated with non-contiguous physicalchunks.

U.S. Pat. No. 6,088,764 to Shyam, et al., whose disclosure isincorporated herein by reference, describes a method for reducing spaceallocation failures in a computer system that utilizes direct accessstorage devices to store data. The method comprises the steps ofdetermining if authorization has been given to attempt to allocate aninitial space request over more than one volume, and, if so, attemptingto allocate space on a plurality of volumes. If the initial spacerequest cannot be allocated on a plurality of volumes, the initial spacerequest is reduced by a preset percentage, an extent limit is removedand an attempt is made to allocate the reduced space request on theplurality of volumes.

U.S. Pat. No. 5,897,661 to Baranovsky, et al., whose disclosure isincorporated herein by reference, describes an apparatus providing alogical unit of undivided data storage that spans physical storagedevice boundaries. The apparatus manages the logical unit of undividedstorage using metadata information stored on the physical storagedevices. Advantageously, the apparatus replicates a minimum portion ofthe metadata information across all of the data storage devices andfavors writing metadata only in the devices where the information isrequired to operate. In a preferred embodiment, a logical unit ofundivided storage is created by defining a logical volume and allocatingportions of available physical data storage devices thereto in order toprovide a minimum logical volume size. Metadata is generated and storedon the data storage devices to provide detailed information about theportions of each data storage device that have been allocated to thelogical volume.

A paper by Kang, et al., “Virtual Allocation: A Scheme for FlexibleStorage Allocation,” published at the OASIS Workshop, Boston, Mass.,Oct. 9-13, 2004, and available at http://ee.tamu.edu/˜swkang/doc/va.pdf,is incorporated herein by reference. The paper describes physicalstorage allocation strategies that provide large shared areas withvirtual storage for multiple file systems.

A paper by Wilson, et al., “Dynamic Storage Allocation: A survey andcritical review,” published in Proceedings of the 1995 InternationalWorkshop on Memory Management, Kinrose, Scotland, UK, Sep. 27-29, 1995,Springer Verlag LNCS, and available at the websitewww.cs.northwestern.edu/˜pdinda/ics-f02/doc/dsa.pdf, is incorporatedherein by reference. The paper covers techniques for dynamic allocationof physical storage, or heap storage.

SUMMARY OF THE INVENTION

Embodiments of the present invention provide efficient methods andapparatus for creating logical data copies in a data storage system.

In embodiments of the present invention, a data storage system receives,typically from a host computer, a first input/output (I/O) command tostore data of a first logical volume. The storage system stores the dataat one or more physical locations, which the storage system associateswith one or more respective logical partitions of the first logicalvolume.

To generate a logical volume copy, the storage system receives amanagement copy command to copy the first logical volume, referred toherein as a source volume, to a second logical volume, referred toherein as a target volume. In response to the command, the storagesystem designates the first logical volume and the second logical volumeas branch volumes of a meta-volume, and associates physical locationspreviously associated with partitions of the first logical volume withthe meta-volume. The storage system thereby links the first and secondlogical volumes to the same physical locations. Subsequently, thestorage system treats the first and second logical volumes in the samemanner when performing I/O and management commands, withoutdifferentiating the first logical volume as the original source of thedata.

In embodiments of the present invention, configuration records,comprising translation records and a meta-volume record, are used toimplement the relationship between the meta-volume and the branchvolumes. A translation record is used to switch the association of thephysical locations from the first logical volume to the meta-volume.Prior to receiving the management copy command, the data storage systemtypically creates translation records that designate a first alias forthe first logical volume. When a subsequent copy command is received,the first alias is assigned to the meta-volume, and a second, differentalias is assigned to the first logical volume. Partition descriptorrecords (PDRs), comprised in the configuration records, link the firstalias to the physical locations at which the data are stored. Thus, byassigning the first logical volume alias to the meta-volume, allphysical locations previously associated with the first logical volumeare immediately associated with the meta-volume. Consequently, when thecopy command is performed, only a small set of configuration records,comprising a translation record and a meta-volume record, are created,enabling the copy process to be substantially instantaneous.

Subsequently, when the data storage system receives a second I/O commandto access data in a logical partition of the first volume or of thesecond volume, the data storage system uses the PDR linked to themeta-volume to identify the required physical location. When the datastorage system receives a third I/O command to write data in one of thelogical partitions, the data is written to a new physical storagelocation and a new PDR is created, linked to the appropriate logicalvolume rather than to the meta-volume.

There is therefore provided, according to an embodiment of the presentinvention, a method for copying a logical volume in a data storagesystem, including:

forming a first logical volume having one or more logical partitions;

storing data at a physical location associated with the one or morelogical partitions;

receiving a first command to copy the first logical volume to a secondlogical volume;

responsively to the first command, assigning the one or more logicalpartitions to a meta-volume and linking the meta-volume to the first andsecond logical volumes;

receiving a second command to access the data in at least one of thefirst and second logical volumes;

responsively to the second command, using a partition descriptor recordassociated with the meta-volume to identify the physical location; and

providing access to the data at the physical location.

Typically, the partition descriptor record includes a first partitiondescriptor record and providing access to the data includes:

responsively to a third command to modify the data in at least one ofthe first and second logical volumes, writing modified data to a furtherphysical location and creating a second partition descriptor recordidentifying the further physical location.

The method may also include deleting the first partition descriptorrecord responsively to creating the second partition descriptor record,and deleting the meta-volume responsively to deleting the firstpartition descriptor record.

Forming the first logical volume may include assigning a first alias tothe first logical volume, and assigning the one or more logicalpartitions to the meta-volume may include assigning the first alias tothe meta-volume and assigning a second alias different from the firstalias to the first logical volume. A third alias different from thefirst and second aliases may be assigned to the second logical volume.

In an embodiment, the method may further include:

responsively to a third command to copy the first logical volume to athird logical volume, assigning the second alias to a furthermeta-volume; and

responsively to a fourth command to access the data in at least one ofthe first and third logical volumes, iteratively seeking the partitiondescriptor record associated with a superior meta-volume.

The superior volume may include a most superior meta-volume, anditeratively seeking the partition descriptor record may include firstseeking the partition descriptor record of the most superiormeta-volume.

In an alternative embodiment, forming the first logical volume includesspecifying a size for the logical volume greater than a physical dataspace available to the data storage system.

Typically, the first command further includes a command to copy thefirst logical volume to a third logical volume, and linking themeta-volume includes linking the meta-volume to the third logicalvolume.

Forming the first logical volume may also include specifying a size forthe logical volume less than or equal to a physical data space availableto the data storage system.

There is further provided, according to an embodiment of the presentinvention, apparatus for copying a logical volume in a data storagesystem, the apparatus including:

a control unit, which is adapted to:

form a first logical volume having one or more logical partitions,

store data at a physical location associated with the one or morelogical partitions,

receive a first command to copy the first logical volume to a secondlogical volume,

responsively to the first command, assign the one or more logicalpartitions to a meta-volume and link the meta-volume to the first andsecond logical volumes,

receive a second command to access the data in at least one of the firstand second logical volumes, responsively to the second command, use apartition descriptor record associated with the meta-volume to identifythe physical location, and

provide access to the data at the physical location.

The present invention will be more fully understood from the followingdetailed description of the embodiments thereof, taken together with thedrawings in which:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a data storage system, in accordancewith an embodiment of the present invention;

FIG. 2 is a schematic diagram of a cache in the data storage system ofFIG. 1, in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of a process implemented when a copy volumecommand is received by the data storage system of FIG. 1, in accordancewith an embodiment of the present invention;

FIG. 4 is a flowchart of a process implemented when a data read commandis received by the data storage system of FIG. 1, in accordance with anembodiment of the present invention;

FIG. 5 is a flowchart of a process implemented when a data write commandis received by the data storage system of FIG. 1, in accordance with anembodiment of the present invention; and

FIG. 6 is an exemplary diagram of hierarchies of meta-volumes, inaccordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENTS

Reference is now made to FIG. 1, which schematically illustrates astorage system 10, in accordance with an embodiment of the presentinvention. Storage system 10 receives, from one or more host computers12, input/output (I/O) commands, comprising commands to read or writedata at logical addresses on logical volumes. Host computers 12 arecoupled to storage system 10 by any means known in the art, for example,via a network or by a bus. Herein, by way of example, host computers 12and storage system 10 are assumed to be coupled by a network 14.

The logical addresses specify a range of data blocks within a logicalvolume, each block herein being assumed by way of example to contain 512bytes. For example, a 10 KB data record used in a data processingapplication on a host computer would require 20 blocks, which the hostcomputer might specify as being stored at a logical address comprisingblocks 1000 through 1019 of a logical volume V1.

Storage system 10 typically operates in, or as, a network attachedstorage (NAS) or a storage area network (SAN) system. However, it willbe understood that the scope of the present invention is not limited tostorage systems operating in any particular configuration. Rather, thescope of the present invention includes systems operating in anysuitable configuration used for storing data.

I/O commands to read data comprise two fields, a first field specifyingthe command type (i.e., read), and a second field specifying the logicaladdress, which includes the logical volume. I/O commands to write datacomprise three fields, a first field specifying the command type (i.e.,write), a second field specifying the logical address, and a third fieldspecifying the data that is to be written.

Storage system 10 comprises one or more caches, indicated in FIG. 1 ascaches 18, 20, and 22. However, it will be appreciated that the numberof caches used in system 10 may be any convenient number. Caches 18, 20,and 22 are distinguished from each other to facilitate the exposition ofcache operation hereinbelow. All caches in system 10 are assumed tooperate in substantially the same manner and to comprise substantiallysimilar elements. Elements in the caches of the system, and operationsof the caches, are described in more detail below with respect to FIG.2.

Each of the caches is assumed to be approximately equal in size and isalso assumed to be coupled, by way of example, in a one-to-onecorrespondence with a set of physical storage. Those skilled in the artwill be able to adapt the description herein, mutatis mutandis, tocaches and storage devices in other correspondences, such as themany-to-many correspondence described in US Patent Publication2005/0015566, titled “Data Allocation in a Distributed Storage System,”which is assigned to the assignee of the present invention and which isincorporated herein by reference. Each set of physical storage comprisesmultiple slow and/or fast access time mass storage devices, hereinbelowassumed to be multiple hard disks. By way of example, FIG. 1 showscaches 18, 20, and 22 coupled to respective sets of physical storage 24,26, and 28. In response to an I/O command, cache 18, by way of example,may read or write data at addressable physical locations of physicalstorage 24. A single addressable physical location, also referred toherein as a track, typically contains 128 data blocks.

In some embodiments of the present invention, a management node 30 ofstorage system 10 receives from a management module 32 a formationcommand to form a logical volume V1. The management module may be runfrom a dedicated external computing system or from one or more of thehost computers. The purpose of the formation command is to permit hostcomputers 12 to specify logical addresses of V1 in subsequent I/Ocommands.

In response to the formation command, management node 30 creates routingrecords which indicate how the logical addresses of V1 are to bedistributed across caches 18, 20, and 22. The routing records do notspecify the physical location on the disks of each logical address, butonly the cache that is responsible for storing the associated data. Inan embodiment of the present invention, the routing of logical addressesis implemented according to methods described in the above-referenced USPatent Publication 2005/0015566. According to the aforementionedmethods, management node 30 assigns logical addresses to groups, hereinreferred to as partitions. Each partition may comprise a set of logicaladdresses equal in size to a track, namely 128 data blocks. Managementnode 30 determines the allocation of partitions among the one or morecaches to provide an approximately equal number of partitions on eachcache. The allocation is such that when data blocks of a logical volumeare written to storage system 10, the blocks will be distributed in abalanced manner across all caches. Furthermore, the association ofpartitions with caches may be done in such a manner that the partitionsof one logical volume associated with a specific cache, such as cache18, may have the same identifying names, or numbers, as the partitionsof additional logical volumes that are also associated with cache 18.That is, if a partition identified as P1 and comprising logicaladdresses of logical volume V1 is stored on cache 18, then partitions ofadditional volumes V2 and V3 with the identification of P1 may also bestored on cache 18.

The routing records, indicating the association of logical addresses oflogical volumes with partitions and the association of the partitionswith caches, are distributed by the management node to one or moregenerally similar network interfaces of storage system 10. The networkinterfaces are indicated in FIG. 1 as three network interfaces 33, 34,and 35, but it will be understood that system 10 may comprise anyconvenient number of network interfaces.

Referring back to the formation command to form volume V1, managementnode 30 also distributes messages to caches 18, 20, and 22 instructingthe caches to form V1. Implementation of the formation command by thecaches is described further hereinbelow (FIG. 2).

Subsequent to the formation of V1, network interfaces 33, 34, and 35receive I/O commands from host computers 12 specifying logical addressesof V1. The network interfaces use the routing records to break thecommands into I/O instructions, or command subsets, that are thendistributed among caches 18, 20, and 22. By way of example, networkinterface 33 may receive a command to read data at a logical addresscomprising blocks 1000 through 1019 of logical volume V1. Networkinterface 33 uses the routing records to convert the logical address(which comprises 20 blocks) to partition addresses, such as a firstpartition address comprising blocks 125 through 128 on a partition P5 ofcache 18, and a second partition address comprising blocks 1 through 16on a partition P6 of cache 20.

Having determined the partition addresses associated with caches 18 and20, network interface 33 then sends I/O instructions specifying thepartition addresses to the respective caches 18 and 20. Each cache, uponreceiving the respective instruction, then determines a physicallocation, i.e., a track, associated with the specified partition. Thus,following the example described above, cache 18 identifies the trackassociated with its partition P5, and cache 20 identifies the trackassociated with its partition P6. Each cache will then read data fromthe indicated track according to processes described further hereinbelow(FIG. 4).

Routing of commands from network interfaces 33, 34, and 35 to each cacheis typically performed over a network and/or a switch. Herein, by way ofexample, the network interfaces are assumed to be coupled to the cachesby a switch 36.

FIG. 2 is a schematic diagram of elements of cache 18 of FIG. 1, inaccordance with an embodiment of the present invention. A control unit38 performs the processing and communications functions of the cache.The control unit manages communications with the network interfaces overswitch 36. Alternatively, in configurations of the present invention inwhich storage system 10 comprises only a single cache 18, control unit38 communicates directly with host computers 12 over network 14. Controlunit 38 also performs the tasks of reading and writing data to physicalstorage 24. The control unit determines tracks of the physical storageat which to read and write data, performing this determination by usingpartition descriptor records 40, herein referred to as PDRs, and byusing configuration records 42, according to processes describedhereinbelow (FIGS. 4 and 5). The PDRs of cache 18 associate thepartitions allocated to cache 18 with tracks of physical storage 24.

Control unit 38 also communicates with management node 30. In responseto management instructions to form or to copy logical volumes, thecontrol unit creates configuration records 42. Configuration recordscomprise logical volume records 43, translation records 44, andmeta-volume records 45. Configuration records, as well as PDRs, whichare collectively referred to as meta-data, may be managed using multipledata management paradigms, such as relational tables or binary trees.

Cache 18 also comprises a data space 46, wherein data may be manipulatedor temporarily stored during an I/O process. Cache 18 further comprisesa partition hash table 48 used by control unit 38 to access PDRs.

An Appendix to the present disclosure, hereinbelow, details six sets ofconfiguration records and PDRs of cache 18. Set 1 provides examples ofconfiguration records and PDRs that may be defined in cache 18 when thedata storage system stores three logical volumes, named respectively V1,V2, and V3. Sets 2 through 6 show how the configuration records and PDRschange during the implementation of I/O and management instructions.

As described above, the configuration records (i.e., logical volumerecords, translation records, and meta-volume records) are generated inresponse to instructions from management node 30. PDRs are created onlyin response to write instructions from the network interfaces.

As shown in Set 1, the first type of configuration records, the logicalvolume records, comprise three fields, these being a logical volume namefield, a size field, measured in thousands of partitions, and ameta-volume field. It will be appreciated that the logical volume namesused herein are for purposes of illustration only, as numeric values aretypically used in computer communications. For clarity, additionalfields comprised in a logical volume record, and which may be used forfunctions of the cache unrelated to the present invention, such as dateand security key fields, are not shown.

Lines 1, 2, and 3 of Set 1 show the logical volume records for V1, V2,and V3. The logical volume records are created in response to volumeformation instructions from the management node.

As indicated by the size fields of the aforementioned records, V1 and V3have equivalent allocations of 100K partitions, which is approximately 6gigabytes (GB). V2 is allocated 200K partitions. It should be noted thatstorage system 10 may be configured to operate in either a fixedallocation mode or a dynamic allocation mode. In a fixed allocationmode, management node 30 only implements a command to form or copy alogical volume if the physical storage coupled to the caches has freedata space equal to or greater than the specified size of the logicalvolume. For example, a command from management module 32 to form logicalvolume V1 may specify a logical volume size of 18 GB. In the fixedallocation mode, the command is typically implemented by management node30 only when caches 18, 20, and 22 have a total of at least 18 GB offree space available on their respective physical storage. By contrast,in the dynamic allocation mode, a logical volume may be formedsubstantially regardless of the amount of free data space available.Subsequently, caches may issue warnings to the management node when I/Ooperations cause free physical storage space to drop below apredetermined minimum.

The meta-volume fields of the three logical volume records (lines 1, 2,and 3) are either zero or null to indicate that the logical volumes arenot associated with meta-volumes. The volumes are thus independent,meaning that they have not yet been used as a source or a target forlogical volume copy commands, as described further hereinbelow.

Translation records of cache 18 (lines 4, 5, and 6) coincide with theabovementioned logical volume records and are also created by theformation command. Translation records comprise two fields, a volumefield, within which may be written either a logical volume name or ameta-volume name, and an alias field, comprising an alias assigned tothe volume or meta-volume. The translation records of Set 1 assign thealias A1 to V1, the alias A2 to V2, and the alias A3 to V3.

Set 1 shows several PDRs (lines 7 through 13) maintained by cache 18.During typical operation, the caches may maintain several hundredthousand PDRs or more. PDRs comprise four fields: an alias field, apartition name or identification (ID) field, a change counter field, anda physical address field. For example, in the PDR of line 7, the aliasis A1, the partition identification is P1, the change counter is 0, andthe physical address is PYYY01.

The change counter field of the PDRs is zero or null, as this field isonly used for PDRs associated with meta-volumes, as described furtherhereinbelow.

In embodiments of the present invention, operating in either the fixedor the dynamic allocation mode, the partition descriptor records arecreated only when data is actually written to physical storage. Thus theresources required by cache 18 for meta-data are proportional to theamount of data actually stored, not to the amount of data allocated forlogical volumes.

Meta-volume records do not appear in Set 1, as meta-volumes only existafter the data storage system has implemented a copy volume command, asdescribed hereinbelow.

FIG. 3 is a flowchart of a process 50 implemented when cache 18 receivesan instruction to copy a source logical volume to a target logicalvolume, in accordance with an embodiment of the present invention.

At a step 52, cache 18 receives a copy instruction from management node30 specifying that V3 is assigned as a copy of V1. Management node 30issues this instruction after receiving the copy command from managementmodule 32. In some embodiments, for example when a modular routingmethod described in US Patent Publication 2005/0015566 is utilized,routing records at the network interfaces may not need to be changed tofacilitate implementation of V3 as a copy of V1. If routing records doneed to be modified, management node 30 typically distributes newrecords as an atomic process.

Set 2 of the Appendix provides examples of the configuration recordscreated by control unit 38 of cache 18 when the copy command isimplemented. At a step 54, a meta-volume record (line 14 of Set 2) iscreated. The meta-volume record comprises four fields, a meta-volumefield, which names the meta-volume, a size field, a logical volumecounter, and a superior meta-volume field.

In the example indicated in Set 2, the meta-volume is created with thename MV1. The size is set equal to the size of the source volume V1,namely 100. The logical volume counter is set to 2, indicating that MV1has two logical branch volumes, V1 and V3. Meta-volumes may be chainedin a hierarchy; however, in Set 2, MV1 is not linked to any superiormeta-volume. The superior meta-volume field is therefore blank or zero.Usage of the superior meta-volume field is described in further examplesprovided hereinbelow.

At a step 56, the meta-volume fields of both the V1 and V3 logicalvolume records are modified to associate both logical volumes with MV1(lines 15 and 17 of Set 2). At an assign alias step 58, translationrecords are updated and created as necessary. A new translation recordis created for MV1 (line 18). MV1 is assigned the alias A1, which waspreviously assigned to the source volume V1. V1 is assigned a new aliasA4 (line 19).

As shown in Set 2, cache 18 creates no new PDRs in response to the copycommand. However, because V3 is reset to reflect the data of V1, thePDRs previously associated with V3 (i.e., the PDRs associated with aliasA3, in lines 12 and 13) are deleted (as indicated by the comment“Deleted” in lines 27 and 28).

In the event that steps 52 through 58 have completed successfully, cache18 provides management node 30 with an acknowledgement of success at astep 60, after which process 50 is complete. Subsequently, managementnode 30, responsively to acknowledgements from all caches 18, 20, and22, sends an acknowledgement to management node 32.

It will be understood that implementation of the copy command may beconfigured within storage system 10 to be an atomic process. It willalso be understood that the duration of the process is independent ofthe size of the volume being copied, and that the process issubstantially instantaneous since the only activities performed aregeneration or updating of a few configuration records. As is shownfurther below, the properties of atomicity, size independence, andinstantaneity apply no matter how many copies of a volume are made.

FIG. 4 is a flowchart of a process 70 implemented by cache 18 uponreceiving an instruction to read data, subsequent to implementingprocess 50, in accordance with an embodiment of the present invention.At an initial step 72, control unit 38 of cache 18 receives a data readinstruction from one of network interfaces 33, 34, and 35, theinstruction typically being generated in response to a read command fromone of host computers 12. By way of example, the instruction is assumedto be received from network interface 33. The instruction is furtherassumed to be a request to read data at blocks 125 through 128 ofpartition P1 of V1.

At a translation step 74, control unit 38 translates the logical volumename V1 to the alias A4, according to the V1 translation record (line 19of Set 2). At a subsequent decision step 76, the control unit determineswhether there exists a PDR associated with P1 of A4, i.e., a PDR inwhich the value of the partition ID field is P1 and the value of thealias field is A4. Partition hash table 48 (FIG. 2) is used tofacilitate the search through the PDR records, since, as stated above,there are typically many records.

Assuming the meta-data status indicated by Set 2, no PDR exists for P1of A4. The “no” branch of step 76 is therefore followed, and processingcontinues at a step 78. At this step, control unit 38 checks themeta-volume link for V1, i.e., the meta-volume field of the V1 logicalvolume record (line 15 of Set 2). The record shows that V1 is linked toMV1. The translation record for MV1 (line 18) associates MV1 with aliasA1.

Processing continues at decision step 80, at which the control unitseeks a PDR for P1 associated with alias A1. As in step 76, partitionhash table 48 is used to facilitate the search through the PDR records.

A PDR does exist for P1 of A1 (line 22 of Set 2). Processing thuscontinues at a read data step 82, rather than reiterating steps 78 and80. At step 82, the control unit reads data into data space 46 from thetrack PYYY01 indicated by the PDR. The control unit then outputs blocks125 through 128 of the track to network interface 33, thereby satisfyingthe request of the read instruction and completing process 70.

It will be understood that the same process flow would be followed toimplement an instruction requesting data from partition P1 of V3, thesecond branch of MV1. By contrast, an instruction specifying data ofpartition P1 of logical volume V2 would be processed without followingthe meta-volume link in step 78, because a PDR exists for P1 of V2 (line25 of Set 2). Consequently, the PDR would be found at decision step 76,and processing would continue directly to read data step 82.

FIG. 5 is a flowchart of a process 90 implemented by cache 18 uponreceiving an instruction to write data, subsequent to implementingprocess 50, in accordance with an embodiment of the present invention.Prior to implementation of process 90, Set 2 is assumed to reflect thestatus of meta-data of cache 18. Following implementation of process 90,the meta-data status is reflected by Set 3.

Implementation of the write instruction by process 90 is similar toimplementation of the read instruction carried out through process 70 ofFIG. 4. At an initial step 92, control unit 38 of cache 18 receives adata write instruction from one of network interfaces 33, 34, and 35. Byway of example, the instruction is assumed to be received from networkinterface 33. Furthermore, for the sake of illustration, the instructionis assumed to be derived from a write command specifying data that is tobe written to blocks 125 through 128 of partition P1 of V1.

At a translation step 94, control unit 38 of cache 18 translates logicalvolume V1 to alias A4, according to the V1 translation record (line 19of Set 2). Next, at a decision step 96, control unit 38 determineswhether there exists a PDR associated with P1 of A4. As described above,partition hash table 48 (FIG. 2) is used to facilitate the searchthrough the PDR records.

Because no PDR exists for P1 of A4, processing continues at a step 98.The meta-volume field of the V1 logical volume record (line 15 of Set 2)associates V1 with the meta-volume MV1, whose alias is A1. Processingcontinues at a decision step 100, at which the control unit seeks a PDRfor P1 of MV1 (specified by the alias A1).

This PDR does exist (line 22 of Set 2), and specifies that the data isstored at a track PYYY01. Processing thus continues at a write data step102. This step comprises first reading the 128 blocks of track PYYY01into data space 46 and then modifying the data at blocks 125 through128, according to the data received from the network interface, so thatthe modified partition can then be rewritten to physical storage.

At this point, the modified partition cannot be written back to trackPYYY01, because another branch of the meta-volume (i.e., V3) stillreferences the unmodified data at track PYYY01. Consequently, a newtrack is allocated to store the new data partition. A new PDR is created(line 42 of Set 3), indicating that the new data is stored at a trackPYYY06.

At an update step 104, the change counter field of the PDR of P1 of MV1(line 37 of Set 3), is incremented from zero to one, to reflect that oneof the branches of MV1, i.e., V1, no longer references this PDR. Whenthe P1 partitions of all logical volumes referencing MV1 have beenmodified, the P1 PDR of MV1 may be deleted. Furthermore, when allpartitions referencing MV1 have been modified, the meta-volume itselfmay be deleted.

Assuming that all prior steps have been completed successfully, cache 18returns an acknowledgement of successful completion to network interface33 at step 106.

After the P1 PDR of V1 has been created, a subsequent command to writeto this P1 partition will be implemented by accessing the PDR with thenew V1 alias A4, rather than by using the MV1 alias. In other words, thecontrol unit will identify the PDR at the first PDR search step 96, andimplementation of process 90 will continue at a step 108, at which thenew data will be written to the same physical location as that indicatedby the PDR, namely PYYY06.

FIG. 6 displays three exemplary diagrams of relationships betweenlogical volumes and meta-volumes, in accordance with an embodiment ofthe present invention. In a first diagram 112, the three logical volumesV1, V2, and V3 are shown as independent entities, corresponding to astate indicated by Set 1 of the Appendix. A second diagram 114 shows thehierarchical relationship created after a logical volume copy command isperformed, designating V3 as a copy of V1, and corresponds to a stateindicated by Set 2 of the Appendix. As shown, V1 and V3 become branchesof MV1, while V2 remains independent.

Below are descriptions of further applications of processes 70 and 90(FIGS. 4 and 5) illustrating how PDRs and configuration records aremodified.

Assuming that the records of cache 18 are as shown in Set 3 of theAppendix, upon receiving a second copy command designating V2 as a copyof V1, control unit 38 creates a second meta-volume, MV2. To implementthe second copy command, the control unit again follows process 50,creating a meta-volume record for MV2, updating links of the logicalvolumes to reference MV2, and creating new aliases for MV2 and for V1.Set 4 of the Appendix shows the specific modifications made to theconfiguration records. At step 54, an MV2 record is created (line 44 ofSet 4), and the “superior meta-volume” field of the newly created MV2record is set to MV1. The logical volume counter for MV2 is set to two,indicating the two branches of MV2, V1 and V2. The logical volumecounter for MV1 is incremented to three (line 43), indicating that threelogical volumes all originate from the same initial source and referencedata indicated by PDRs of MV1. The size of MV2 is set equal to the sizeof V1. Note that V2 is larger than V1, which means that V2 may storemore data than was planned for V1.

Further modifications to the configuration records of cache 18 compriseassigning MV2 to the meta-volume fields of the V1 and V2 logical volumerecords at step 56 (lines 45 and 46), and revising the translationrecords to assign new aliases at step 58. MV2 is assigned the prioralias V1, namely A4 (line 49), and V1 is assigned a new alias A5 (line50). Changes to the PDRs comprise deleting former PDRs of V2, because V2is reset to reflect the data of V1, and incrementing the change counterof the P1 PDR of MV1 (line 53), because the P1 partition of V1 does notreference this PDR.

As indicated in a third diagram 116 of FIG. 6, implementation of thesecond copy command causes MV2 to take the place of V1 as a subordinatevolume of MV1. It may be understood from the description above thatadditional meta-volumes may be added to the hierarchy represented bydiagram 116, such that an extremely large chain of copies may begenerated. The only resources utilized by each copy are the few bytesrequired for the additional configuration records.

It may be further understood that process 90 illustrates a “bottom-up”algorithm for seeking a PDR, whereby the iterative search begins byseeking a PDR associated with the logical volume (step 96), and thencontinues by iteratively seeking PDRs associated with superiormeta-volumes. Alternatively, a “top-down” algorithm may be used, wherebya PDR is first sought for the most superior meta-volume of a logicalvolume (e.g., MV1 is the most superior meta-volume of V1 in diagram 116of FIG. 6). Subsequently, each subordinate volume is checked until thePDR is found.

After implementing the second copy command, control unit 38 may receivefurther write commands, which again trigger process 90 of FIG. 5. Set 5of the Appendix shows PDRs that are defined by control unit 38 aftersubsequent write instructions have been implemented, rewritingpartitions P5 of V1, V2, and V3. Before the implementation of thesewrite instructions the three P5 partitions are associated with MV1 andstored at a physical location PYYY02 (line 38). Following implementationof the write instructions, new physical locations are used to store thethree partitions, as indicated by the three new PDRs at lines 73 through75 of Set 5.

After each successive write, the control unit increments the changecounter of the P5 PDR of MV1 (line 70), such that the change counter isincremented to 3. Control unit 38 compares the change counter with theMV1 logical volume counter (line 57 of Set 5), to determine whether thePDR is referenced by subordinate volumes. In this case, the counters areequal, indicating that the P5 PDR of MV1 is not needed. The PDR is thusdeleted (as indicated by the comment, “Deleted” on line 70).

Set 6 of the Appendix lists configuration records of cache 18 afteradditional write commands are implemented, causing P1 partitions of V1and of V2 to be modified. After the partitions for both these branchesof MV2 have been changed, the P1 PDR of MV2 (with alias A4, on line 88)has a change counter value of 2, equal to the logical volume counter ofthe MV2 meta-volume record (line 77). Consequently, the PDR is no longerneeded and is deleted by control unit 38 (indicated by the comment“Deleted” on line 88).

Because this is the only PDR associated with MV2, the meta-volume recorditself (line 77) and the translation record (line 82) for MV2 are alsodeleted (as indicated by the comment “Deleted” on the respective linesin Set 6). Finally, the logical volume records for V1 and V2, whichformerly referenced MV2 in their meta-volume field (lines 61 and 62 ofSet 5), and are linked instead to MV1 (lines 78 and 79).

Although the embodiments described hereinabove relate to a distributeddata storage system serving host computers over a network, it will beappreciated that the principles of the present invention may also beapplied, mutatis mutandis, to storage systems in other configurations,such as stand-alone systems serving individual or multiple hosts.Furthermore, although the association of a meta-volume with tracks isimplemented hereinabove by translation records, other methods ofassociation may be envisioned. For example, the partition descriptorrecords may include a volume ID field rather than an alias field, and acopy command may be implemented by changing the field in each PDR toreflect a meta-volume ID. The methods described hereinabove may also beapplied to additional data storage management commands such as a commandto copy a source volume to multiple target volumes, some of which may beread-only volumes. It will thus be appreciated that the embodimentsdescribed above are cited by way of example, and the present inventionis not limited to what has been particularly shown and describedhereinabove. Rather, the scope of the present invention includes bothcombinations and sub-combinations of the various features describedhereinabove, as well as variations and modifications thereof which wouldoccur to persons skilled in the art upon reading the foregoingdescription and which are not disclosed in the prior art.

APPENDIX

Set 1: Sample configuration records for three logical volumes, V1, V2and V3. No logical volume is linked to a meta-volume. Line # LogicalVolume Records Meta- Logical Volume Size Volume Comments 1 V1 100 0 2 V2200 0 3 V3 100 0 Translation Records Volume Alias Comments 4 V1 A1 5 V2A2 6 V3 A3 Partition Descriptor Records Partition Change Physical AliasID counter Address Comments 7 A1 P1 0 PYYY01 8 A1 P5 0 PYYY02 9 A1 P9 0PYYY03 10  A2 P1 0 PYYY04 11  A2 P5 0 PYYY05 12  A3 P1 0 PYYY06 13  A3P5 0 PYYY07

Set 2: Sample records for logical volumes, V1, V2, and V3, after V3 isdesignated a copy of V1. V1 and V3 are designated as branches of acommon meta-volume, MV1. PDRs of V3 are deleted. Line # Meta-VolumeRecords Logical Superior Meta- Volume Meta- Volume Size counter VolumeComments 14 MV1 100 2 0 New record Logical Volume Records Meta- LogicalVolume Size Volume Comments 15 V1 100 MV1 Modified 16 V2 200 0 17 V3 100MV1 Modified Translation Records Volume Alias Comments 18 MV1 A1 Newrecord 19 V1 A4 Modified 20 V2 A2 21 V3 A3 Partition Descriptor RecordsPartition Change Physical Alias ID counter Address Comments 22 A1 P1 0PYYY01 23 A1 P5 0 PYYY02 24 A1 P9 0 PYYY03 25 A2 P1 0 PYYY04 26 A2 P5 0PYYY05 27 A3 P1 0 PYYY06 Deleted 28 A3 P5 0 PYYY07 Deleted

Set 3: Sample records after writing to partition P1 of V1 (A4). Line #Meta-Volume Records Logical Superior Meta- Volume Meta- Volume Sizecounter Volume Comments 29 MV1 100 2 0 Logical Volume Records LogicalVolume Size Meta-Volume Comments 30 V1 100 MV1 31 V2 200 0 32 V3 100 MV1Translation Records Volume Alias Comments 33 MV1 A1 34 V1 A4 35 V2 A2 36V3 A3 Partition Descriptor Records Partition Change Physical Alias IDCounter Address Comments 37 A1 P1 1 PYYY01 Modified 38 A1 P5 0 PYYY02 39A1 P9 0 PYYY03 40 A2 P1 0 PYYY04 41 A2 P5 0 PYYY05 42 A4 P1 0 PYYY06 Newrecord

Set 4: Sample records of logical volumes, V1, V2, and V3, after V2 isdesignated a copy of V1. PDRs of V2 (A2) are deleted. Line # Meta-VolumeRecords Logical Superior Meta- Volume Meta- Volume Size counter VolumeComments 43 MV1 100 3 0 Modified 44 MV2 100 2 MV1 New record LogicalVolume Records Meta- Logical Volume Size Volume Comments 45 V1 100 MV2Modified 46 V2 200 MV2 Modified 47 V3 100 MV1 Translation Records VolumeAlias Comments 48 MV1 A1 49 MV2 A4 New record 50 V1 A5 Modified 51 V2 A252 V3 A3 Partition Descriptor Records Partition Change Physical Alias IDCounter Address Comments 53 A1 P1 2 PYYY01 Modified 54 A1 P5 0 PYYY02 55A1 P9 0 PYYY03 56 A2 P1 0 PYYY04 Deleted 57 A2 P5 0 PYYY05 Deleted 58 A4P1 0 PYYY06

Set 5: Sample records after writing to partitions P5 of V1 (A5), V2(A2), and V3 (A3). PDR for P5 of MV1 (A1) is deleted because changecounter equals logical volume counter. Line # Meta-Volume RecordsLogical Superior Meta- Volume Meta- Volume Size counter Volume Comments59 MV1 100 3 0 60 MV2 100 2 MV1 Logical Volume Records Meta- LogicalVolume Size Volume Comments 61 V1 100 MV2 62 V2 200 MV2 63 V3 100 MV1Translation Records Volume Alias Comments 64 MV1 A1 65 MV2 A4 66 V1 A567 V2 A2 68 V3 A3 Partition Descriptor Records Partition Change PhysicalAlias ID Counter Address Comments 69 A1 P1 2 PYYY01 70 A1 P5 3 PYYY02Deleted 71 A1 P9 0 PYYY03 72 A4 P1 0 PYYY06 73 A5 P5 0 PYYY04 New record74 A2 P5 0 PYYY05 New record 75 A3 P5 0 PYYY07 New record

Set 6: Sample records after writing to partitions P1 of V1 (A5) and V2(A2). PDR for P1 of MV2 (A3) is deleted, and meta-volume andtranslations records for MV2 are deleted. Line # Meta-Volume RecordsLogical Superior Meta- Volume Meta- Volume Size counter Volume Comments76 MV1 100 3 0 77 MV2 100 2 MV1 Deleted Logical Volume Records Meta-Logical Volume Size Volume Comments 78 V1 100 MV1 Modified 79 V2 200 MV1Modified 80 V3 100 MV1 Translation Records Volume Alias Comments 81 MV1A1 82 MV2 A4 Deleted 83 V1 A5 84 V2 A2 85 V3 A3 Partition DescriptorRecords Partition Change Physical Alias ID Counter Address Comments 86A1 P1 2 PYYY01 87 A1 P9 0 PYYY03 88 A4 P1 2 PYYY06 Deleted 89 A5 P5 0PYYY04 90 A2 P5 0 PYYY05 91 A3 P5 0 PYYY07 92 A5 P1 0 PYYY08 New record93 A2 P1 0 PYYY09 New record

1. A method for copying a logical volume in a data storage system,comprising: forming a first logical volume; storing in physical storageof the data storage system a quantity of data of the first logicalvolume; receiving a first command to copy the first logical volume to asecond logical volume; responsively to the first command, formingmeta-data having a size that is independent of the quantity of the data;receiving a second command to access the data; and responsively to thesecond command, using the meta-data to access the data.
 2. The method ofclaim 1, wherein forming the meta-data comprises creating no partitiondescriptor records.
 3. The method of claim 1, wherein the meta-datacomprises a meta-volume and wherein using the meta-data comprisesseeking a partition descriptor record associated with the meta-volume.4. The method of claim 1, wherein the meta-data comprises a meta-volumeand wherein forming the meta-data comprises associating the first andsecond logical volumes and the meta-volume.
 5. The method of claim 4,wherein associating the first and second logical volumes and themeta-volume comprises establishing a search order among the first andsecond logical volumes and the meta-volume, and wherein accessing thedata comprises following the search order to seek a partition descriptorrecord.
 6. The method of claim 1, wherein receiving the second commandcomprises receiving a write command and accessing the data comprisesmodifying the data.
 7. The method of claim 6, wherein the data comprisesoriginal data, wherein storing the original data comprises writing theoriginal data to a first physical location in the physical storage, andwherein modifying the original data comprises writing modified data to asecond physical location in the physical storage.
 8. The method of claim7, wherein modifying the original data comprises creating a partitiondescriptor record associating a meta-volume with the first physicallocation.
 9. The method of claim 7, wherein modifying the original datacomprises testing a change flag indicating whether the original data isto be preserved.
 10. Apparatus for copying a logical volume in a datastorage system, comprising: physical storage; and a control unit, whichis adapted to: form a first logical volume; store in the physicalstorage a quantity of data of the first logical volume; receive a firstcommand to copy the first logical volume to a second logical volume;responsively to the first command, form meta-data having a size that isindependent of the quantity of the data; receive a second command toaccess the data; and responsively to the second command, use themeta-data to access the data.
 11. The apparatus of claim 10, wherein thecontrol unit is adapted to form the meta-data by creating no partitiondescriptor records.
 12. The apparatus of claim 10, wherein the meta-datacomprises a meta-volume and wherein the control unit is adapted to usethe meta-data to seek a partition descriptor record associated with themeta-volume.
 13. The apparatus of claim 10, wherein the meta-datacomprises a meta-volume and wherein the control unit is adapted to formthe meta-data by associating the first and second logical volumes andthe meta-volume.
 14. The apparatus of claim 13, wherein the control unitis adapted: to establish a search order among the first and secondlogical volumes and the meta-volume; and to follow the search orderduring a search for a partition descriptor record.
 15. The apparatus ofclaim 10, wherein the second command comprises receiving a write commandand wherein the control unit is adapted responsively to the writecommand to modify the data.
 16. The apparatus of claim 15, wherein thedata comprises original data, and wherein the control unit is adapted:to store the original data by writing the original data to a firstphysical location in the physical storage; and to modify the originaldata by writing modified data to a second physical location in thephysical storage.
 17. The apparatus of claim 16, wherein the controlunit is adapted responsively to receiving the write command to create apartition descriptor record associating a meta-volume with the firstphysical location.
 18. The apparatus of claim 16, wherein the controlunit is adapted responsively to receiving the write command to test achange flag indicating whether the original data is to be preserved.